W228-W231 Nucleic Acids Research, 2012, Vol. 40, Web Server issue 
doi:10.1093lnar/gks592 



Published online 12 June 2012 



pKNOT V.2: the protein KNOT web server 

Yan-Long Lai\ Chih-Chieh Chen^ and Jenn-Kang Hwang^'^'* 

^Institute of Bioinformatics and Systems Biology, National Chiao Tung University and ^Center for Bioinformatics 
Research, National Chiao Tung University, Hsinchu 30068, Taiwan, Republic of China 

Received February 17, 2012; Revised IVlay 12, 2012; Accepted IVlay 26, 2012 



ABSTRACT 

Knotted proteins have recently received lots of at- 
tention due to their interesting topological novelty 
as well as its puzzling folding mechanisms. We pre- 
viously published a pKNOT server, which provides a 
structural database of knotted proteins, analysis 
tools for detecting and analyzing knotted regions 
from structures as well as a Java-based 3D 
graphics viewer for visualizing knotted structures. 
However, there lacks a convenient platform 
performing similar tasks directly from 'protein se- 
quences'. In the current version of the web server, 
referred to as pKNOT v.2, we implement a homology 
modeling tool such that the server can now accept 
protein sequences in addition to 3D structures or 
Protein Data Bank (PDB) IDs and return knot 
analysis. In addition, we have updated the 
database of knotted proteins from the current PDB 
with a combination of automatic and manual pro- 
cedure. We believe that the updated pKNOT server 
with its extended functionalities will provide better 
service to biologists interested in the research of 
knotted proteins. The pKNOT v.2 is available from 
http://pknot.life.nctu.edu.tw/. 

INTRODUCTION 

Knotted proteins are interesting not only in their extraor- 
dinary topologies (1,2) but also in their intriguing folding 
mechanisms (3-6). There are currently four types of knots 
identified in the protein structures in Protein Data Bank 
(PDB): the trefoil knot (or 3 1 knot), the figure-eight knot 
(or 4iknot), the 52 knot and the Stevedore's knot 
(or 6iknot). Knotted proteins present a knotted problem 
to both experimental and theoretical biologists: how does 
a peptide chain thread through the loops to form multiple 
crossing (up to six crossings) knots? Recent experiments 
showed that some dimeric knotted proteins appear to 
have a similar folding mechanism as that of unknotted 



proteins (7), and that knotted proteins can exist in a 
knotted conformation even in their chemical unfolded 
states (8). Computer experiments have been performed 
to simulate possible folding mechanisms of knotted 
proteins (4,5). Protein's knots are implicated in substrate 
binding and enzyme activity (9). For example, the knot of 
iV-acetylornithine transcarbamylase is part of the active 
site (10), while the knot TrmD tRNA methyltransferase 
is shown to be important for substrate binding and cata- 
lytic activity (11). 

Currently, there are two web servers available for 
analysis of knotted protein — pKNOT (12) (http://pknot 
.life.nctu.edu.tw) and KNOTS (13) (http://knots.mit.edu). 
Both web servers share similar functionalities: they provide 
a database of knotted protein structures; they can analyze 
structures for possible knots and they provide a 3D mo- 
lecular viewer for users to visualize and to manipulate the 
orientations of the knotted structures. In addition, 
pKNOT provides information about the smallest 
possible peptide chain that can form a knot structure 
and generates movie files of knot detection processes for 
pedagogical purpose (12). These web servers provide 
structure-based analysis for uses, but they cannot accept 
query sequences for knot analysis. 

Herein, we implemented a structure modeling module 
based on (PS)^ (14,15), which uses a consensus strategy in 
both template selection and target-template alignment to 
model 3D structures from homologous sequences. In this 
way, the updated pKNOT server can accept query se- 
quences, build its 3D structure, analyze its structure for 
possible knots and, if found, return their knot types and 
information such as the knot core and the knot depth (12) 
of the knots identified in the structures. 

MATERIALS AND METHODS 

As the number of solved 3D protein structures increases, 
so is the number of knotted proteins, as shown in Figure 1 . 
As reported previously (12,16,17), missing residues in 3D 
structures or non-standard PDB formats may cause mis- 
identification of knots in automatic approaches. To ensure 
the accuracy of our database of knotted proteins, we used 
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Figure 1. The yearly growth of knotted proteins in PDB from 1984 to 2011. 



a hybrid approach: we first scan the PDB using automatic 
procedures to generate a smaller set of knotted proteins 
(12), and then examine the smaller set by manual inspec- 
tion to remove structures with dubious knots. 

Homology modeling and knot detection 

Since the protein families of knotted proteins are limited, 
it is feasible to build reliable homologous structures from 
protein sequences. 3D structures are modeled from protein 
sequences using (PS)^ (14,15) previously developed by our 
laboratory. The (PS)^ method, taking advantage of a con- 
sensus strategy for template selection and target-template 
ahgnment, compares favorably with most homology 
modeling methods (15). pKNOT detects knots using 
Taylor's algorithm (1), which is basically a smoothing pro- 
cedure of a 3D curve — it first fixes the protein's N and C 
termini, and then repeatedly smoothes and straightens the 
protein chain. For an unknotted structure, it will reduce 
the chain into a simple straight line; for a knotted struc- 
ture, with its structural details being smoothed out, its 
knot can be easily detected. Although the method 
usually converges in <50 iterations, it happens that the 
method may not converge after a given number of iter- 
ations. This may cause misidentification of protein knots. 
We have implemented a file, named Convergence File, to 
provide results at each iteration. The Convergence File, as 
well as the 3D structure viewer, will help users to ensure 
the convergence of results. The knot type can be topo- 
logically detected by computing the corresponding 
Alexander polynomial (18). However, in the current 
version, knot types are identified by manual inspection. 
The simple work flow of pKNOT is schematically shown 
in Figure 2. 

With a modeled structure or a crystal structure avail- 
able, the pKNOT server can compute the knot core and 
the knot depth of the knotted region of a knotted protein. 
The knot core is defined as the smallest region that will 
remain as a knot (1). The knot depth represents the 
product of the number of residues that must be deleted 
from both ends in order to free the knot (1). Both values 
provide useful information for a further investigation of 
the structural characteristics of the knotted region. 



Input format 

The pKNOT web server can accept two types of input 
(Figure 3): in STRUCTURE QUERY, users can either 
type PDB ID or upload a structural file in the PDB 
format and in SEQUENCE QUERY, users can enter 
protein sequences in FASTA format. In STRUCTURE 
QUERY, several advanced options are available: user 
can toggle either IGNORE (default) or PRESERVE 
option during chain smoothing processes. It sometimes 
happens that there are missing residues in protein struc- 
tures. The IGNORE option wiU close the breaks by using 
the shortest line segment connecting the breaks, while the 
PRESERVE option preserves the breaks in the chain, 
keeping the endpoints of each segment fixed. Users can 
also set the number of iterations (the default values is 
500)^ and the collision threshold (the default threshold 
0.5 A). The colhsion threshold is the minimal distance to 
determine whether line segments will intersect during 
smoothing procedures (12). However, users are advised 
to try the default values first. It usuaUy takes <10s in 
run time to detect a protein knot, but it takes longer 
time to model a homologous structure, around 3-10 min 
in run time. However, for a very long sequence, say, of 
2000 amino acids, it may take >20min in run time. 

Output format 

Upon structure query, pKNOT will return the results con- 
cerning each chain of the structure, including its length 
(CHAIN LENGTH), the type of knot (KNOT TYPE), 
the length of each knot (KNOT LENGTH) and the visu- 
alization of each knotted structure (DISPLAY 
STRUCTURE). If clicked on the KNOT TYPE, the 
server will return a complete hst of knotted structures. 
The server provides a 3D molecular viewer for users to 
view 3D structures and manipulate their orientations in 
space. The original and the smoothed structures can be 
visualized together or individually in the 3D graphics 
viewer for easy comparison. 

Upon sequence query, if the 3D structure being success- 
fully built, its modeled structure will be shown in the 3D 
graphics viewer and its knot type based on sequence 
homology will be returned (Figure 3). 
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Figure 2. The schematic work flow of pKNOT: accepting a query protein sequence, modeling a 3D structure through homology modeling (PS)" and 
smoothing out its backbone for the detection of its knot. The knot shown in this exainple is a 3 1 knot. 
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Figure 3. The features of the pKNOT v.2 web server: (A) STRUCTURE QUERY: users can enter PDB ID or upload PDB file. (B) SEQUENCE 
QUERY: users can enter or upload protein sequences in FASTA format. (C) Users can view the modeled structure in 3D graphics viewer and inspect 
its knot region. 



Database 

We have identified 566 Icnotted structures, which are 
almost twice as many as those of the last version of 
pKNOT (12). We have currently identified four types 
of knots — (i) the proteins with trefoil knots including 
(a) methyltransferase, (b) transcarbamylase, (c) methio- 
nine adenosyltransferase, (d) carbonic anhydrase, 
(e) pre-niRNA-splicing factor RDS3, (0 VirC2-like pro- 
teins and (g) MJ0366-like protein; (ii) the proteins with 
figure-eight knots including (a) phytochrome, (b) the 
core proteins of bluetongue virus and (c) ketol-acid 
reductoisomerase; (iii) ubiquitin hydrolase identified with 
a 52 knot and (iv) a-haloacid dehalogenase (PDB ID: Ijbx) 
being only structure identified with currently the most 
complicated knot, i.e. a Stevedore's knot. Knotted 
structures can be classified into 10 SCOP folds, comprising 



17 SCOP families (2). However, it should be noted that, as 
the date of writing, Ibjx has not yet been classified in 
SCOP. User can download the hst of the complete 
knotted proteins. 



DISCUSSION 

Herein, we present an updated version of pKNOT web 
server. The size of the updated database of knotted struc- 
tures is almost double the size of the previous version. 
Each knotted structure of the 566 knotted structures is 
manually validated to reduce false positives usually 
plagued a fully automated detection system. One of the 
unique features of pKNOT v.2 is the integration of 
homology modeling with the existing knot detection and 
analysis functions such that the updated server can accept 
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protein sequences as well as protein structures. We believe 
pKNOT V.2 will prove more useful than its pervious 
version to biologists. 
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